ED 319 788 



TM 015 041 



AUTHOR 
TITLE 



PUB DATE 
NOTE 
PUB TXPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Harsh r Herbert W*; Hocevar, Dennis 

The Hultidimensionality of students' Evaluations of 

Teaching Effectiveness: The Generality of Factor 

Structures across Academic Discipline, Instructor 

Level r and Course Level. 

15 Jan 90 

18p- 

Reports - Research/Technical (143) 
HFOl/PCOl Plus Postage • 

College Curriculum; *College Faculty; *Factor 
Structure; *Graduate Students; Higher Education; 
Multidimensional Scaling; *Student Evaluation of 
Teacher Performance; *Teacher Effectiveness; Teaching 
Assistants; *Undergraduate Students 
*Student Evaluation of Educational Quality; 
University of Southern California 



ABSTRACT 

Factor analyses of student evaluations of teaching 
effectiveness were conducted for 24,158 courses at the University of 
Southern California and for each of 21 different subgroups derived 
from the total group. All classes evaluated by six or more students 
were included in the study. The subgroups we 3 designed to differ in 
terms of instructor level (teaching assistants or regular faculty) ; 
course level (undergraduate or graduate), and academic discipline. 
The same 9 factors that the Student Evaluation of Educational Quality 
instrument was designed to measure lyere consistently identified in 
each of the 22 different factor analyses, and all factor structures 
were remarkably well-^defined and consistent. Correlations between 
factor scores based on the total group factor analysis and the 21 
subgroup factor analyses were very high, and most were greater than 
0.99. Because of the large number and diversity of classes in this 
study, the results provide stronger ' support for the generality of the 
factor structure underlying students' evaluations of teaching 
effectiveness than does any previous research. Five data tables are 
included. (TJH) 



********************* ********at«>k***** 

* Reproductions supplied by EDRS are the best that can be made * 
^ from the original document. * 

************************************** ******** *^****** ********** ****** 



U.S. DEPARTMENT OF EDUCATION 
Off<« a Educ*liOftal RMearch «nd Improvement 

EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERia 



/ihts document has been reproduced as 
received from the person or organttaticn 
orig«nalir)g it 

O Mirtor Changes have been made to improve 
reproduction quality 

• Points of vtew or optmons staled in thif docu' 
ment do not necessarily represent official 
OERI position or policy 



"PERMISSION JO.REPRODUCE THIS. 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



The Multidimensional ity of Students' Evaluations of T0ac±iing Effectiveness 
The Generality of Factor Structures Across Academic Discipline, Instructor 
Level, and Course Level 



Herbert W. Marsh 

University of Sydney, NSW Australia 
Dennis Hocevar 

University of Southern California 



15 January, 1990 
Running Head: Multidimensional ity of Students* Evaluations 



BEST COPY AVAILABLE 



ABSTRACT 



Factor analyses of student evaluations of teaching effectiveness were 
conciicted for a total group 24,158 courses and for eac^l of 21 different 
stibgroups derived from the total group. The subgroups were constructed to 
differ in terms of instructor level (teaching assistants or regular 
faculty), course level (undergraduate or graduate), and academic discipline. 
The same nine factors that the Student Evaluation of Educational Quality 
(SEEQ) instrument was designed to measure were consistently identified in 
each of the 22 different factor analyses and all factor structures were 
^remarkably well defined and consistent. Correlations between factor scores 
based on the total group factor analysis and the 21 subgroup factor analyses 
were very high and most very greater than .99. Because of the large number 
and diversity o-F classes in this study, the results provide stronger support 
for th^ generality of the factor structure underlying students* evaluations 
of teaching effectiveness than does any previous research. 



MultidimenBionality cF Stuctents' Evaluations 1 

Irrffarmation from students' evaluations necessarily depends on the 
content of the evaluation items. ^5tudent ratings, like the teaching that 
they represent, should be viewed as a multidimensional construct (e.g., a 
teacher may be CMite well organized but lack enthusiasm). This contention 
is supf)orted by common-sense and a considerable body of enpirical research. 
Unfortunately, most evaluation instruments and research fail to take 
cognizance of this multidimensional ity. If a survey instrument contains an 
ill-defined hodge-podge of different items and student ratings are 
summarized by an average of these items, then there is no basis for knowing 
what is being measured, no basis for differentially weighting different 
components in a way that is most appropriate to the pa»-ticular purpose they 
are to serve, nor any basis for comparing these results with other findings. 
If a survey contains separate groups of related items that ar-e derived from 
a logical analysis of the content oF effective teaching and the purposes 
that the ratings are to serve, or from a care^ijilly constructed theory of 
teaching and learning, and if enpirical procedures such as factor analysis 
d^nonstrate that the items within the same group do measure the same trait 
and tfiat different traits are separate and distinguishable, then it is 
possible to interpret what is being meaaired. 

In evaluating the need for multiple dimensions of students' evaluations 
it is important to consider the purposes that the ratings are intended to 
serve. Marsh (1984; 1987) noted that student ratings are used variously to 
provide and a»:e recommended for purposes of: (a) formative -feedback to 
faciaty about the effectiveness of their teachii>g; (b) a sunvnative measure 
of teaching effectiveness to be used in personnel decisions; (c) information 
for students to use in the selection of courses and instructors; (d) an 
outcome or a process description for research on teaching. Marsh (in press) 
argued that for 3 of these 4 recommended uses of students' evaluations — 
all hut personnel decisions — there appears to be general agreement that 
appropriately constructed multiple dimensions of are more usefal than a 
single summary score. 

Por personnel decisions, there is considerable debate as to whether a 
single score is more useful than of profile on scores reflecting multiple 
dimensions (see Abrami, 1988; Ma^sh, 1987; 1989). Some researchers, while 
accepting the multidimensionality of students' evaluations and t\^B 
importance of measuring separate components for some purposes such as 
feedback to faculty, defend the unidimensionality of student ratings because 
"when student ratings are used in personnel decisions, one decision is 
made." There are, however, serious problems with this reasoning. First, the 



Miltidimansiontility of Students' Evaluations 2 
use to t^ich student ratings are put has nothing to do with their 
dimensionality, though it may influence the -form in which the ratings are to 
be presented. Second, even if & single total score were the most useful 
■form in which to summarize student ratings for personnel decisions — and 
there is no reason to assume that generally it is — this purpose would be 
poorly served by a ill-defined total score based upon an ad hoc collection 
of items that was not appropriately balanced with respect to the components, 
of effective teaching that were being measured. If a single score were to 
be used, it should represent a weinhted average of the different components 
where the weight assigned to each component was a function of logical and 
empirical analyses. Third, implicit in this argument is the suggestion that 
administrators are unable to utilize or prefer not to be given multiple 
sources of information for use in their deliberations, but I know of no 
empirical research to support such a claim. 

The Content of the Multiple Dimensions 
An important issue in tire construction of multidimensional rating scale 
instruments is the content of the dimensions to be surveyed. The most 
typical approach consists of a logical analysis of the content of effective 
teaching and the purposes of students' evaluations, supplemented perhaps 
with literature reviews of the characteristics other researchers have found 
to be useful, and feedback from students and faculty. An alternative 
approach based on a theory of teaching or learning could be used to posit 
the important dimensions, though such an approach does not seem to have been 
used in student evaluation research. However, with each approach, it is 
important to also use empirical techniques such as factor analysis to 
■ftirther test the dimensionality of the ratings. The most carefully 
constructed instruments combine both logical/theoretical and snpirical 
analyses in the researrfn and development of student rating instruments. 

The student evaluation literature does contain several examples of well 
constructed instruments with a clearly defined factor structure that provide 
measures of distinct components of tec\ching effectiveness. Some of these 
instruments and the factop-s that they measure are: 

1) Frey's Endeavor instrument (Frey, Leonard & Beatty, 1975; also see 
Marsh, 1981, 1986): Presentation Clarity, Workload, Personal Attention, 
Class Discussion, Organization/Planning, Grading, and Student 

Accompl ishments; 

2) The Student Description of Teaching (SDT) questionnaire originally 
developed by Hildebrand, Wilson and Dienst (1971): Analytic/Synthetic 
Approach, Organization/Clarity, Instmctor Grr-tp Interaction, Instructor 
Individtal Interaction, and Dynamism/Enthusiasmj 



riiitidimensionaUty of Students' Evaluations 3 



3) Marsh's 'Student Evaluations of Educational Quality (SEEQ) instrument 
(Marsh, 19825; 1983, 1984} 1987} also see Table 3 presented later): 
Lsarning/Value, Instructor Enthusiasm, Organization/Clarity, Individual 
Rapport, Group Interaction, Breadth of Coverage, Examinations/Grading, 
Assignments/Readings, and Workload/Difficulty} 

4) The Michigan State SIRS instrument (Warrington, 1973): Instructor 
Involvement, Student Interest and Performance, Student-Instructor 
Interaction, Course Demands, and Course Organization. 

The systematic approach used in the development of these instruments and ths 
similarity of the factors which they measure, supports their construct 
validity. Factor analyses of responses to each of these instruments provide 
clear support for the factor structure they were designed to measure, and 
demonstrate that the students' evaluations do measure distinct components of 
teaching effectiveness. More extsnsive reviews describing the coo^xanents 
found in other research (Cohen, 1981} Feldnan, 1976} Kulik & McKeachie, 
197^) identify dimensions similar to those described here. 

Factor analysis is a useful technique for determining what factors are 
being measured, but it cannot determine wheWier the obtained factors are 
important to the understanding of effective teaching. Consequently, 
carefully developed surveys — even when factor analysis is to be used — 
t^ically begin with item pools based upon literature reviews, and with 
systematic feedback from students, faculty, and adtninistrators about what 
items are important and what type of feedback is useful (e-g., Hildebrand, 
Wilson & Dienst, 1971; Marsh, 19S2b). For example. In the development of 
SEEQ a large item pool was obtained from a literature review, instruments in 
current usage, and interviews with faculty and students about 
characteristics which they see as constituting effective teaching. Then, 
students and faculty were asked to rate the importance of items, faculty 
were asked to judge the potential usefulness of the items as a basis for 
feedback, and open-ended student comments on pilot instruments were examined 
to datermine if important aspects had been excluded. These criteria, along 
with psychometric properties, were used to select items and revise 
subsequent ve-sions. This systematic development constitutes evidence for 
the content validity of SEEQ and^ makes it unlikely that it contains any 
irrelevant factors. 

Feldman (1976} also see Feldman, 1983, 1984, 1986, 1987) categorized 
the different characteristics of the superior university teacher -from the 
student's point of view with a systematic review of research that either 
ErJc ^^^^^ students to specify these characteristics or inferred them on the 



riiitidimsnsionality of Studsnts* Evaluations 4 
basis of correlations between specific characteristics and students' overall 
evaluations. On the basis of such studies, and also to facilitate 
presentation of this material and his subsequent reviews of other studtent 
evaluation research, Peldr^n derived a set of categories shown in Table 1. 
This list provides the most extensive and, perhaps, the best set of 
characteristics that are likely to underlie students' evaluations of 
effective teaching. Nevertheless, Feldnan used primarily a logical analysis 
based on his examination of the student evaluation literature, and his 
results do not necessarily imply that students can differentiate these 
characteristics. Also, to actually measure all these characteristics as 
separate scales would require an instrument that contained as many as 100 
items and this would be unacceptable in most settings. This set of 
characteristics does, however, provide a useful basis -fbr evaluating the 
comprehensiveness of the set of evaluation, factors on a given instrument. 

Insert Table 1 About Her e 
Feldman <1976) noted that factors actually identified by factor 
analysis typically correspond to more than one of his categories. The 
highest loading items on any given -Factor often come from mare than one of 
his categories. In Table 1 I have attempted to match Feldnan' s categories to 
the SEEQ factors. The only categories that are apparently unrelated to any 
of the SEEQ factors are teacher elocution (category 7) and, perhaps, teacher 
sensitivity to class level and progress (category 9). All of the SEEQ 
factors represent at least one of Feldman' s categories and most reflect two 
or more categories. In contrast, none of Feldnan' s categories reflect more 
than one of the SEEQ factors. This logical analysis of the content of the 
SEEQ factors and Feldnan' s categories demonstrates that there is substantial 
overlap between the two but that Feldnan' s categories reflect more narrowly 
defined constructs than do the SEEQ factors. 

The Present Investigation 
The purpose of the present investigation is to extend previous work on 
the factor structure of SEEQ responses. Factor analysis providess tests of 
whether: (a) students are able to differentiate among components of 
effective teaching, (b) the empirical factors confirm the factors that an 
instrument is designed to measure, and (c) the same factors are identified 
consistently in different setting, and in different academic disciplines. 
Factor scores derived from factor analysis also provide potentially usefel 
scores for summarizing the results of students' evaluations. 

SEEQ has been used across a diverse array of academic disciplines at the 
^ University^of Southern California since 1976. For present purposes ratings 
ERIC from 24 9 158 di-fferent classes were divicted into 21 di-fferent groups varying 



ruitidimsnBionaiity of students' Evaluations 5 

in terms of instructor level (teaching assistant or regular academic sta-Ff ) , 
the course level (undergradoate or graduate), and the academic discipline. 
Twenty-two separate -Factor analyses were conducted; one -for the total group 
and one -fbr each of the 21 different subgroups. The factor structure of S£EQ 
responses was evaluated by cc3fnparing the results of these factor analyses 
and comparing empirically derived factor scores based on these factor 
analyses. The availability of such a large data base using on a 
psychometrically sound instrument provides much stronger tests of Ihe 
comparability of the factor structures across instructor level, course 
level, and academic discipline than any previous research. 

Methods 

Sample and Procedures. 

IXiring the period 1976-1988, SEEQ -fbrms were adninistereo in 
approximately 40,000 courses at the University of Southern Cali-fbrnia. 
Although the use of the SEEQ form is voluntary, the University requires that 
each academic unit collect some form of students* evaluations of teaching 
effectiveness for all courses and staff are not considered -far promotion- 
unless students' evaluations are provided. Most of the academic uriits that 
do use SEEQ require that all their staff are evaluated in all courses. The 
evaluation fonns are typically distributed to staff shortly before the end 
of each academic term, administered and collected by a student in the clasF 
or by a member of the academic staff according to printed instructions, and 
taken to a central office where they are processed. This program, the SEEQ 
instrument on which it is based, and research that led to its dsvelopment 
are described by Marsh (1987). 

For present purposes all classes taucht by regular academic staff or by 
teaching assistants that were evaluated by 6 or mere students were 
considered. Excluded were classes evaluated by 5 or fewer students and 
classes taught by teachers who ".^ire not graduate student teaching assistants 
or regular academic staff (i.e., had academic uitles other than assistant, 
associate, or full profess^ ). This resulted in a sample of 24,158 classes. 
Each of these classes i^s then classified into 21 subsamples such that each 
subsample had at least 400 classes. All classes were first categorized into 
three general groups consisting classes taught by teaching assistants, 
undergradiate classes taught by regular faculty, and graduate -.-ourses taught 
by regular faculty. Classes were then classified into divisions or schools 
<e.g., Social sciences. Engineering) ar,d then into specific departments (e.g., 
psychology, systems enginee- ing) wherever there were more than 400 classes. 
ErJc^"^' ®^'='^' class was assigr. to the most specific subgroup for which there 



r,ality of Students* Evaluations 6 
was at least 400 classes. All classes were classified into one and only, one 
S'jbgroup. This procedure resulted in 7 groups of classes taught by teaching 
assistants, 7 groups of undergraduate classes taught by regular faculty, and 7 
groups of graduate class es tauyht by regular facult y (s^ Table 2) . 

Insert Table 2 About Here 

Statistical Analyses 

All analyses were performed on class-average responses -for the total 
saaiple and for each of the 21 subsamples. The factor analyses of the 35 SEEQ 
items insisted of principal axis factoring with a Kaiser normalization and 
iterations followed by an oblique rotation (see SPSS, 1986). For each factor 
analysis empirically defined factor scores were generated using the 
regression method (SPSS, 1986) . 

Everett and Entrekin (1980; Everett, 1983) noted that factors extracted 
from responses to the same items administered to different samples should be 
comparabl3 if they are to be used as sunvnary measures. They went on to argue 
that correlations between factor scores derived from two different factor 
analyses "provides a coe^^icient of factor comparability, which is a more 
direct measure than the coefficient of congruence based upon factor 
loadings" (p. 165). Everett (1983) further demonstrated that this procedure 
provided a useful indication of the number of factors that should be 
retained. When factor scores based on a large number of different groups are 
considered, Marsh (1988) proposed a variation of this procedure that is 
used in the present investigation. In this variation, factor scores based on 
the total group and those based on each separate group are compared. Thus, 
for each case there is a set of factor scores based on the total group and a 
set of -factor scores based on the pat ticular group to which that case 
belongs. Correlations beti«Jeen (natcfiing factors in the two sets of factor 
scores provide an index the factor comparability between the factor analysis 
based on the total sample and that based on each subsample. 

Results 

The factor analysis based on the total sample of 24,258 classes is 
i^unmarized in Tables 3 and 4. The factor analysis clearly identifies each of 
the 9 factors that SEEQ is designed to measure and the factor structure is 
very well defined. The 35 target loadings — factor loadings for items 
designed to measure each factor ^presented in boxes in Table 3) — are 
consistently large; the mean is .650 and every one is at least .392. The 280 
nontarget loadings are consistently small; the mean is .067 and rone is 
larger than • .245. Whereas the factor pattern correlations indicate that the 
^ factors are positively correlated (mean r = .318), the largest correlation 
EBs|C ■•.512. These results replicate previous research with SEEQ. 



ML*lfe4dif?*»nsiQnality of Studsnts* Evaluations 7 



Insert Tables 3 and 4 Abcxjit Here 
The set of 21 fac±or analyses ODnducted on eac±i subsample is sunvnarized 
in Table 4. "For eac±i of the 21 factor analyses the target loadings are 
corBistently high (means of .578 to ,712), nontarget loadings are 
consistently low (means of •062 to .076), and factor correlations are 
moderate (means of .257 to .399). The results are similar in the 7 sets of 
courses taught by teaching assistants, the 7 sets of undergraduate courses 
taught by regular faculty, and the 7 sets of graduate courses taught by 
regular faculty. The similarity of the factor structures across the 22 
di-fferent factor analyses provides remarkably strong support for the 
generality of SEEQ factor structure and much stronger support than has been 
demonstrated with any other student evaluation instruments. 

Althcxigh not emphasized in Table 4, there is also a consistent pattern 
for the nontarget loadings in the 22 factor analyses. The overall instructor 
rating and the overall course ratir.g are most strongly related to the 
Instructor Enthusiasm and Learning/Values factors respectively. Not 
surprisingly, however, thes^^ overall rating itemo typically have moderate 
factor loadings on several other factorso For example, in the total group 
factor analysis (Table 3) the largest two nontarget loadings are for the 
overall instructor item on thiv Organization/Clarity factor and the for the 
overall course item on the Instructor Enthusiasm factor. Across all the 
factor analyses one of the overall rating items had the largest nontarget 
loadings in 13 of the 22 analyses. These results demonstrate that particularly 
the overall rating items are not pure indicators of any of tte factors. 

Inspection of Table 3 i».Jicates that most of the nontarget loadings are 
close to zero or modestly positive. However, nearly half nontarget loadings 
for the Workload/Difficulty items on the remaining SEEQ factors and the 
nontarget loadings for the remaining 31 items on the Workload/Difficulty 
factor are negative. Similarly, the lowest correlations among SEEQ factors 
consistently involve the Workload/Difficulty factor. These patterns observed 
for the total group factor analysis are found consist*2ntly in the 21 
subgroup factor analyses^ 

Two sets of factor scores I'epresenting the nine SEEQ factors were 
generated for each of the 24,158 classes. One set of factor scores was based 
on the factor analysis of the tot^l sample /esented in Table 3 and the 
second set of factor scares was based on the particular subsample to which 
the class belonged. These two sets of factor scares were then correlated f^^ 
each of the 21 different subsamples. High correlations between factor score© 
O representing tte same factor provide support for the comparability of the 



MAltidimensionalitv of Students' Evaluations 8 
di-Ffarent -Factor etructures. Nearly all of these 189 correlations are greater 
than .95 and a majority are larger than .99 (Table 5). Low correlations 
between factor scores representing di-Fferent factors provide support -far the 
di-Fferentiation among the factors. The means correlations among these 
nonmatching factor score s vary from .254 to .446 fo r the 21 subsamples. 

Insert Table 5 Abcaut Here 
Discussion 

Research reviewed here suggests that mcsst — if not all — of the 
recommended purposes of students' evaluations of teaching effectiveness are ' 
better served by an appropriately constructed set of multiple dimensions 
than by a single summary scores. Most evaluation instruments, however, do not 
measure a well-defined set of evaluation factors and a.ost research does not 
incorporate this multidimensional perspective. An important exception to 
this generalization is the SEEQ instrument and research based upon it that 
are the focus of the present investigation. Four observations support the 
appropriateness of the factors used to sunmarize SEEQ responses. First, 
empirical research supports the SEEQ factor structure. Second, the 
systematic development of the SEEQ instrument supports the content validity 
of its factors. Third, factor analyses of other instruments designed to 
measure multiple dimensions of teaching effectiveness result in factors like 
those identified by SEEQ responses. Finally, there is substantial agreement 
between the content of empirically derived SEEQ factors and the categories 
of effective teaching developed by Feldman (1976). 

The purpdse of the present investigation was to extend previous 
research by evaluating the generality of the SEEQ factor structure across a 
very large and diverse set of di-Fferent classes. The SEEQ -factor Ftrucfcurte 
was well defined for the total sample o^ 24,158 classes and this result 
replicates previous rscsarch based an smaller samples. Of particular 
importance, was the finding that the factor structure was also well defined 
for 21 subgroups that varied in terrr^ of instructor level, course level, and 
academic discipline. The nine factors that SEEQ is designed to measure we/e 
identified in all 22 factor analyses and factor scores based on the total 
group analysis were almost perfectly correlated with those based on each of 
the 21 subgroup analyses. Because of the psychonetric properties of the SEEQ 
instrument and because o' the siiq and diversity of the dat.^ base considered 
here, the results provide much stronger support for the generality of the 
factor structure underlying students' evaluations of teaching effectiveness 
than does any previous research. 



Multidimenaionality of Students' Evaluations 9 
REFERENCE 

Abrami, P. C. (r SEEQint the truth about student ratings of 

instruction. Educational researcher > 81 ^ 43-~45. 
Cohen, P. A. (1981), Student ratings of instruction and st dent achievement: 

A meta-analysis of multisection validity studies. Reviovj of Edi^c^tional 

Research, 51, 281-309. 
Everett, J. E. (1983). FactoK comparability as a means of determining the 

num'^r of factors and their rotation. Multivariate Behavioral Research. 18 

197-218. 

Everett, J. E., Entrekin, L. V, (1980). Factor comparability end the 

advantages of multiple group factor analysis. Multivariate Behavioral 

Research, 2 16&-180. 
Feldman, K. A. (1976). The superior college teacher from the student's view. 

Research in Hidier Education, 5^243-288. 
Feldman, K, A. (1983). The seniority and instructional experience of college 

teachers as related to the evaluations they receive from their st^idents. 

Research in Higher Education, 18, 3-124. 
Feldnan, K, A- (1984), Class size and student??' evaluatiors of college 

teacher and courses: A closer look. Research in Niqlter Educa tion , 21, 45- 

116. 

Feldman- K. A. (1986). The perceived instructional effectiveness of college 
teachers as related to their personality and attitudinal characteristics: 
A revia*^ and synthesis. Research in Higher Education, 24, 139-213, 

Feldman, K, A*. (1987). Research productivity and scholarly accomplishment: A 
review and exploration. Research in Higher Education, 227-298, 

Frey, P. W. , Leonard, D. W. , & Beatty, W. W. (1975). Student ratings of 
instruction: Validation research. America n Educational Research Journal. 
12, 327-336. 

Hildebrand, M., Wilson, R. C, & Dienst, E. R. (1971). Evaluating 

ujiversity teaching, Berkeley: Center for Research and Development in 

Higher Education, University of California, Berkeley. 
Kulik, J. A., & McKeachie^, W. J. (1975). The evaluation of teachers in 

higher education. In Kerlinger (Ed.), Review g£ Research in Educ a tion, 

(Vol. 3). Itasca, IL: Peacock. 
Marsh, H. W. (1981). Students' Evaluations of tertiary instruction: Testing 

the applicability of American surveys in an Australian setting, 

Australian Journal of Education . 25, 177-192. 
Marsh, H. W. (1982b). SEEQ: A reliable, ^/alicl, and useful instrument for 
^ collecting students' evafuations of university teaching. British Journal 
Ety^^ of E^cational Psychology, 52, 77-95. |^ ^ 



Miltidimersionality f:f Students' Evaluations 10 

Marsh, H. W. (1984). Students' evaluations of universit/ teaching: 
Dimensionality, Reliability, Validity, Potential Biases, and Utility. 
Journal of Edicatlonal Psvcholoov. 76. 707-754. 

Marsh, H. W. < 1936) . Applicability paradigm: Students' evaluations of 
teaching effectiveness in different countries. Journal erf Educational 
Psvcholoov. 78. 465-473. 
Marsh, H. W. (1987). Students' evaluations of university teaching: Research 
findings, methodological issues, and directions for future research' 
International , Journal of Edicational Resgar^ 11^ 253-3S8. (Whole Issue 
No. 3) 

Marsh, H. W. (1989). Responses reviews of "Students' evaluations of 
university teaching: Research findings, metliodological issues, and 
directions -for future research" Irstructional Evaluation. 10. 5-9. 

SPSS (1986). SPSSx User's Guide. New York: McGraw-Hill. 

Warrington, W. G. (1973). Student evaluation of instruction at Michigan 
State University. In A. L. Sockloff (Ed.), Proceedings: Jhe first 
invitational , conference on faoilty effectiveness g^^^i^^ ^ ^^j^^^^^ 
<pp. 164-182). Philadelphia: Measurement and Research Center, Temple 
University. 



ERIC 



13 



Mil ti dimensionality of Students' Evaluations 11 

Table 1 

Cataoories of E-Pfective Teaching Adapted From Feldnan (1976, 1985, 1984 
1986, 1987) an.: :he SEB3 factors Most Closely Related to Each Cateoory 



Feldnan's Categories 



SEEQ Factors 



1) ' Teacher's stimulation of interest in the course 
and subject matter. 

2) Teacher's enthusiasm for subject or for teaching. 

3) Teacher's knowledge of the subject. 

4) Teacher's intellectual expansiveness and breadth 
of coverage. 

5) Tsacher's preparation and organization rf the 
course. 

6) Clarity and understrndableness of presentetions 
and explanations. 

7) Teacher's elocutionary skills. 

8) Teacher's sensitivity to, and concern with, 
class level and progress. 

9) Clarity of course objectives and requiremente. 

10) Nature and value of the course material 
including ite usefulness and relevance. 

11) Nature and usefulness of supplementery 
materials and teaching sicte. 

12) Perceived outcome or impact of instruction. 

13) Teacher's fairness and irrpartiality of 
evaluation of studente; quality of exams'. 

14) Personal Characteristics (personality) 

15) Nature, quality and frequency of feedback 
from teacher to studente. 

16) Teacher's encouragement of questions and 
discussion, and openness to the opinions of others. 

17) Intellectual challenge and encouragement 
of independent thought. 

IB) Teacher's concern and respect for students; 
friendliness of of the teacher. 

19) Teacher's availability and helpfulness. 

20) Difficulty and workload of the course. 



Instructor Enthusiasm 

Instructor Enthusiasm 
Breadth of Cdverage 
Breadth of Coverage 

Organ i zat ion/C 1 ar i ty 

^ 'ganization/Clarity 

None 
None 

Organization/Clarity 
Ass i gnmsnte/Read i ngs 

Assignmente/Readi ngs 

Learning/Value 
Examinations/Grading 

Instructor Enthusiasm 
Examinations/Grading 

Group Interaction 

Learning/Value 

Individual Rapport 

Individual Rapport 
Workload/Difficulty 



^bte. The actual categories used by Feldman in different studies <e.g., 
Feldman, 1976, 1983, 1984, 1986, 1987) varied somewhat. Categories 12 and 14 
were net included in Feldman (1976) but were included m subsequent studies 
u^.ereas category 20 was included by Feldman (1976) but not subsequent 
studies. One other category (clsissroom management) was only included by 
Feldman (1976). 

^ Whereas these SEEQ factors most closely match the corresponding 
categories, the match is apparently not particularly close. 



Miltldifiensionallty of Students' Evaluations 12 



Table 2 

aoflimary of the 21 Subsanples of Courses 



N of Classes Academic Unit 
Undergradxate Courses Taught By Teaching Assistants 



1 431 General 

2 610 Business 

3 565 Huxnanities 

4 1606 Social Sciences 

5 683 Spanish and Portuguese 

6 1368 Economics 

7 902 Communication 

Undergradjiate Courses Taught By Regular Faculty 

1 1421 General 

2 2326 Business 

3 956 Humanities 

4 2320 Social Sciences 

5 1693 Engineering 

6 590 History 

7 538 Psychology 

Graduate Courses Taught By Regular Faculty 

General 
Business 
Social Sciences 
Engineering 
Education 

Systems Enginesrina 
Safety and Systems'Tlana^ment 



1 


757 


2 


2049 


3 


1157 


4 


957 


5 


1213 


6 


457 


7 


1559 


Total 



24158 



Note^ For present purposes all classes with 6 or more sets of ratings were 
classified into 21 subsamples such that each sui-sanple had at least 400 
classes. All classes were first categorized into general groups consisting 
of classes taught by teaching assistants, undergraduate classes taught by 
regular faculty, and graduate courses taught by regular faculty. Classes 
were then classified into divisions or schools (e.g.. Social sciences or 
Engineering) and then into specific d^artments (e.g., psychology or systems 
engineering) whenever there were more -than 400 classes. All classes were 
classified into one and only one subsample. 



ERIC 



15 



Multidimensionality of Students' Evaluations 13 



Tabli 3 

t&, f.".']lf'*! Rewlt* for th« Total Siipli of 24,I5L hit of CUii-avirage Retpontisi 
Factor Loadlnjiand Factor Correiationi 



SEEO Scalti and Iteat (pariphraitd) SEES Factors 



Ltarning/Valus 

Count challenaing ( itiaulating 
Learned foaithing valuabk 
tncrtaie lubject inttriit 
Learntd I undiritood lubjtct aattir 
Overall Ccurie Rating 

Instructor Enthusiais 
Enthusiastic about tiaching 
Dynaaic «.nd enargitic 
Enhancit! prisintation with huaor 
Teaching style hild ycur interest 
Overall Instructor Rating 

Organization/Clarity 
Lecturer explanations dear 
Hatirials m11 explained I prepirid 
Course objectives stated ( pursued 
Lectures facilitated taking notes 

Croup Interaction 
Encouraged class discussion 
Studrnts shared knoMledgc/ideas 
Encouraged questions I gave ansNers 
Encouraged expression of idsas 

Individual Rapport 

Friendly tOHaris individual students 
Kelcoaid students seeking help/advice 
Interisted in individual students 
Accessible to individual students 

Breadth sf coverage 
Contrasted various icplications 
Gave background of ideas/concepts 
Cave different points of vieM 
Discussed current dcvelopaents 

Exaiinatiofts/Grading 
txaiinition feedback valuable 
Evaluation acthods'fair/appropriate 
Tested course content as eaphasized 

Assignaents/Readings 
Readings/texts were valuable 
They contributed to understanding 

Korklaid/Oifficulty 
Course difficulty (casy-hird) 
Cours! workload (light-heavy) 
Course pace (slou-fait) 
Hours per week outside of class 

Factor Pattern Correlations 



Lrn Enth Orgn 6rp Ind Brd Exai Asgn Kork 

;.4j4! .168 .103 .015 .014 .159 .099 .155 .291 

!.607l .083 .100 .026 .050 .103 .085 .147 .113 

1.646! .078 .034 .039 .058 .169 .074 .131 .020 

1.4871 .043 .176 .152 .045 .047 112 49 - 217 

1.410! .211 .173 .041 .042 .085 .166 .175 .069 

.095 !.544! .129 .072 .195 .115 .052 .069 .025 

■064 !.7I4! .094 .05? .085 .083 .069 .071 .042 

.089 .650 -.02o .103 .078 .129 .090 .054 -.045 

.137 !.581! .187 .131 .026 .050 .110 .073 .017 

.172 !.392! .245 .083 .141 .09: .140 .075 .039 

.146 .145 !.510! .176 .060 .075 .079 .104 -.072 

.069 .037 !.677l .060 .075 .073 .094 .118 .005 

'1?? 'B M??! .055 .070 .065 .175 .184 .024 

.031 .040 I.5851-.093 .049 .175 .146 .044 .020 

.058 .103 .011 !.769! .070 .033 .067 .080 .002 

■912"'?}§ -095 .093 .043 .073 -.029 

.059 .105 .167 !.583! .151 .094 .100 .080 .001 

.045 .069 .035 !. 674! .182 .110 .094 .070 -.013 

.051 .161 -.001 .176 !.612! .063 .112 .057 -.038 

•SS? '95? '^^^ -070 !.786! .036 .093 .059 -.007 

.086 .140 .001 .137 !.647! .057 .138 .059 .004 

-.014 -.028 .139 .037 i.636! .099 .136 .104 .010 

.043 .037 .118 .059 .068 i.676! .077 .109 .065 

.087 .085 .134 .020 .044 !.662! .056 .122 .004 

.035 .066 .086 .123 .101 !.636! .097 .113 -.004 

.207 .113 .018 .086 .039 !.562! .084 .040 .000 

•$?? -or .101 .028 1.670! .088 .044 

.047 ,044 .011 .043 .107 .078 !.749! .099 -.033 

.063 .036 .129 .034 .064 .047 i.643! .146 -.026 

-.008 -.004 .019 .022 .018 .053 .025 !.885!-.003 

.127 .021 .036 .027 .039 .012 .140 !.716! .072 



'•W^ ''^^l "'O^^ .096 .015 .018 !.861! 

•100 -.054 .004 .085 -.001 .002 -.035 .038 !.907! 

'1?! 'SSI -005 -.001 .035 .040 !.689! 

.148 -.044 -.085 .034 -.001 -.^06 -.006 .042 !.798i 



Learning/Value 
Instructor Enthusiast 
Organization/Clarity 
Group Interaction 
Individual Rapport 
Breadth of Coverage 
Exaainitions/Grading 
Assignaents/Readings 
HorLioad/Difficulty 



..." ^'"P ^""^ *^8" 

1.000 

.434 1.000 

.407 .427 

.350 .364 

.263 .400 

.449 .419 

.401 .392 

.488 ^319 

.128 .076 



1.000 

.210 1.000 

.331 .455 1.000 

.454 .327 .352 1.000 

.511 .315 .493 ,403 1.000 

.431 .312 .338 .418 .510 1.000 

.044 -.072 -.009 .106 .033 .154 1.000 



Mh. Target loadings, the fsctor loadings iteas designed to define each 
SEES factor, are presented in boxes. 



Maltidioyensionality of Students' Evaluations 14 



Table 4 



ajnigary of Factor Analyses Conducted on the Total Sample and on Each of the 
Zl a^bsantples; Target Loadings, Nontarget Loadings, and Factor Correlations 



Target Loadings 
Max Min Mean 



Nontarget Loadings Factor Correlations N of 
Max Mi n Mean Max Mi n Mean Classes 



1 


.940 


.364 


.578 


2 


.925 


.415 


.667 


3 


.96£> 


.332 


.597 


4 


.900 


.399 


.662 


5 


.871 


.285 


.622 


6 


.893 


.351 


.620 


7 


.888 


.365 


.630 


Mn 


.912 


.359 


.625 



Undergraduate Courses Taught By Teaching Assistants 

.224a -.142 .076 
.246a -.220 .066 
.278 -.312 .072 
.246a -.195 .063 
.261 -.276 .067 
.272 -.291 .068 
.275a -.227 .069 
.257 -.238 .068 

Taught By Faculty 
.224a -.166 .066 
.217a -.240 .069 
.236 -.232 .071 
.207 -.215 .068 
.213a -.226 .067 
.292a -.300 .063 
.326 -.276 .072 
.245 -.236 .068 



Undergraduate Courses 

1 .918 .414 .650 

.881 .408 .662 

.892 .364 .633 

.881 .366 .653 

.933 .336 .642 

.925 .385 .666 

.947 .315 .617 

.911 .370 .646 



2 
3 
4 
5 
6 
7 

Mn 



Graduate Courses Taught 

1 .959 .337 .640 

2 .924 .367 .672 

3 .927 .348 .642 

4 .829 .303 .605 

5 .885 .359 .643 

6 .933 .438 .712 

7 .947 .335 .623 
Mn .915 .3^ .648 



By Faculty 
.248a -.142 .067 
.231a -.206 .067 
.256a -.176 .067 
.412 -.209 .067 
.271 -.084 .074 
.251 -.154 .062 
.302a -.283 .070 
.282 -.189 .068 



Total Sample 

.907 .392 .650 



.245a -.218 .067 



.575 


.012 


.388 


431 


/IDA, 




• zrrB 


610 
565 


.561 


-.190 


.304 


• DID 


. 1q2 


• oOO 


160c> 


• DZ^ 


— .03*> 


• o^*f 


6B3 


• Dl / 






1368 


• Ooo 




.N>54 


902 


• Dol 








.538 


.011 


.31:2 


1421 


.516 


-.117 


.317 


2326 


.548 


.057 


.342 


956 


.513 


-.272 


. 


2320 


.509 


.008 


.314 


1693 


• 500 


-.010 


.264 


590 


.505 


-.099 


.313 


538 


.520 


-.060 


.3)0 




.530 


-.079 


.303 


757 


.527 


-.092 


.292 


2049 


.568 


-.009 


.313 


1157 


.552 


-.172 


.270 


957 


.511 


-.092 


.304 


1213 


.469 


-.128 


.257 


457 


.543 


-.114 


.324 


1^9 


.528 


-.(m 


.295 


.512 


-.073 


.318 


24158 



each of 21 subsamples. Factor loadings and factor correlations from these 22 
factor analyses are summarized here. 

^ The largest nontarget loading was for either the overall teacher rating or 
the overall coarse rating. 



ERLC 



17 



fill ti dimensionality of Students' Evaluations 15 

Table 5 

Correlations Between Factor Scores Based on the Total Sample and Based on 
the 21 Individual Samples 

Student Evaluation Factors Mean 

, r- 1.1. n Z '. r Mean Non- 

Lrn Enth Qrgn urp Ind Brd Exam Asgn Work Match Match 

Undergraduate Courses Taught By Teaching Assistan 

k -S? -221 -SS^ •'^^ -987 .989 . 988 . 992 

i -^9 •'^^ -"^^ .997 .988 .996 

3 .799 .9g .979 . 975 .993 .990 . 990 .903 

4 .990 .997 .993 .995 .994 .994 .996 .997 

5 -984 . 994 . 990 . 989 . 998 . 993 . 983 .992 
^ -2^ -SS? -225 .998 .993 . 99fl .995 .995 
7 .980 .995 .995 .995 .997 .994 . 991 .998 
Mn .960 .990 .990 .991 .994 . 993 . 990 .982 

Undergraduate Courses Taught By Regular Faculty 

h -IS -2^ -999 .998 .997 .997 .993 

i -Tl^ -22^ -997 .999 .996 .999 .991 

A -SS -22^ -995 .996 .995 

4 .996 . 999 .998 . 999 .999 . 996 .999 .999 

! -6^ -21? -2^ •'^ -995 . 997 . 998 

6 .982 .980 . 972 . 995 .997 . 984 .991 .946 

Z -2^? -22° •'^ .990 .990 .994 .953 . , ,^ . .oov 

Mm .981 .993 . 985 . 995 .997 .993 .996 . 982 .994 .991 .^5 

eradiate Courses Taught By Regular Faculty 

1 .987 .991 .965 .997 .991 .991 .986 .994 

I -222 -22^ -2^ •'^ -998 .991 .998 .993 

3 .976 . 994 . 977 .996 .995 . 987 .998 .989 

4 .948 .983 .992 .994 .996 . 982 .975 . 7^ 

5 .979 .996 .991 .989 . 992 .994 . 997 . 987 
k -225 -22^ -994 . 993 . 994 .983 .995 .994 
I 'W. -2^ -225 -993 . 993 . 999 . . 
Mn .980 . 991 .985 .995 . 994 .989 . 992 .961 .984 



.996 


.991 


.446 


.996 


.994 


.310 


.986 


.954 


.338 


.996 


.995 


.314 


.993 


.991 


.349 


.995 


.993 


.368 


.991 


.993 


.360 


.993 


.987 


.355 



.992 


.995 


.318 


.996 


.996 


.341 


.994 


.995 


.379 


.996 


.998 


.350 


.992 


.995 


.328 


.995 


.982 


.272 


.995 


.973 


.359 


.994 


.991 


.335 



.995 


.989 


.322 


.993 


.994 


.311 


.997 


.990 


.329 


.917 


.951 


.323 


.996 


.991 


.318 


.994 


.993 


.254 


.996 


.991 




.984 


.986 


.316 



Ngte^ Factor scores were generated from the factor analyses of the total 
sample and each of 21 subsamples comprising the total- Ccjrrelations between 
matching factor scores from the total sample and subsample analyses are 
presented for each factor along with the mean of the correlations among non- 
matching factors. 



ERLC 



18 



